Generalization Error Bounds for Aggregation by Mirror Descent with Averaging
نویسندگان
چکیده
We consider the problem of constructing an aggregated estimator from a finite class of base functions which approximately minimizes a convex risk functional under the l1 constraint. For this purpose, we propose a stochastic procedure, the mirror descent, which performs gradient descent in the dual space. The generated estimates are additionally averaged in a recursive fashion with specific weights. Mirror descent algorithms have been developed in different contexts and they are known to be particularly efficient in high dimensional problems. Moreover their implementation is adapted to the online setting. The main result of the paper is the upper bound on the convergence rate for the generalization error.
منابع مشابه
A survey of Algorithms and Analysis for Adaptive Online Learning
We present tools for the analysis of Follow-The-Regularized-Leader (FTRL), Dual Averaging, and Mirror Descent algorithms when the regularizer (equivalently, proxfunction or learning rate schedule) is chosen adaptively based on the data. Adaptivity can be used to prove regret bounds that hold on every round, and also allows for data-dependent regret bounds as in AdaGrad-style algorithms (e.g., O...
متن کاملAnalysis Techniques for Adaptive Online Learning
We present tools for the analysis of Follow-The-Regularized-Leader (FTRL), Dual Averaging, and Mirror Descent algorithms when the regularizer (equivalently, proxfunction or learning rate schedule) is chosen adaptively based on the data. Adaptivity can be used to prove regret bounds that hold on every round, and also allows for data-dependent regret bounds as in AdaGrad-style algorithms (e.g., O...
متن کاملParallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging
This work characterizes the benefits of averaging techniques widely used in conjunction with stochastic gradient descent (SGD). In particular, this work sharply analyzes: (1) mini-batching, a method of averaging many samples of the gradient to both reduce the variance of a stochastic gradient estimate and for parallelizing SGD and (2) tail-averaging, a method involving averaging the final few i...
متن کاملRecursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging
We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the l1-constraint. It is defined by a stochastic version of the mirror descent algorithm (i.e., of the method which performs gradient descent in the dual space) with an additional av...
متن کاملGeneralization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization
The success of deep learning has led to a rising interest in the generalization property of the stochastic gradient descent (SGD) method, and stability is one popular approach to study it. Existing works based on stability have studied nonconvex loss functions, but only considered the generalization error of the SGD in expectation. In this paper, we establish various generalization error bounds...
متن کامل